What we will cover today:
Lets work with the Lake Trout data as the weights are pretty cool in this one and will bakc up the main points of this lecture.
This is easily translated into the mice weight data from Vancouver or the pine needle data and we could do those too on the fly if you want….
lake trout
# the stuff above controls the output and is also set at the top so dont need here
# Load the pine needle data
# Use here() function to specify the path
df <- read_csv("data/lake_trout.csv")
# Examine the first few rows
head(df)# A tibble: 6 × 5
sampling_site species length_mm mass_g lake
<chr> <chr> <dbl> <dbl> <chr>
1 I8 lake trout 515 1400 I8
2 I8 lake trout 468 1100 I8
3 I8 lake trout 527 1550 I8
4 I8 lake trout 525 1350 I8
5 I8 lake trout 517 1300 I8
6 I8 lake trout 607 2100 I8
T-tests are parametric tests
Non-parametric tests: no assumption about probability distribution
Mukasa et al 2021 DOI: 10.4236/ojbm.2021.93081
<>
<!– –>
length_ne12_box_plot <- isl_ne12_df %>% filter(lake =="NE 12") %>% ggplot(aes(x=lake, y = length_mm)) +geom_boxplot() + coord_flip()
length_ne12_qq_plot <- isl_ne12_df %>% filter(lake =="NE 12") %>%ggplot(aes(sample = length_mm)) +
stat_qq(color = "steelblue") +
stat_qq_line() +
labs(title = "QQ Plot", x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
(length_ne12_box_plot+ length_ne12_qq_plot) / (ne12_box_plot + ne12_qq_plot) <>
# T test for lenght
# Perform standard t-test
t_test_length_result <- t.test(
length_mm ~ lake,
data = isl_ne12_df,
var.equal = TRUE # Standard t-test with equal variance assumption
)
# Perform Welch's t-test (unequal variances)
welch_test_length_result <- t.test(
length_mm ~ lake,
data = isl_ne12_df,
var.equal = FALSE # Welch's t-test
)[1] "Standard t-test results for lenght_mm:"
Two Sample t-test
data: length_mm by lake
t = 8.616, df = 331, p-value = 2.888e-16
alternative hypothesis: true difference in means between group Island Lake and group NE 12 is not equal to 0
95 percent confidence interval:
270.1939 430.0761
sample estimates:
mean in group Island Lake mean in group NE 12
698.200 348.065
[1] "Welch's t-test results for lenght_mm:"
Welch Two Sample t-test
data: length_mm by lake
t = 9.0183, df = 9.6241, p-value = 5.309e-06
alternative hypothesis: true difference in means between group Island Lake and group NE 12 is not equal to 0
95 percent confidence interval:
263.1673 437.1026
sample estimates:
mean in group Island Lake mean in group NE 12
698.200 348.065
# T test for lenght
# Perform standard t-test
t_test_mass_result <- t.test(
mass_g ~ lake,
data = isl_ne12_df,
var.equal = TRUE # Standard t-test with equal variance assumption
)
# Perform Welch's t-test (unequal variances)
welch_test_mass_result <- t.test(
mass_g ~ lake,
data = isl_ne12_df,
var.equal = FALSE # Welch's t-test
)[1] "Standard t-test results for mass_g:"
Two Sample t-test
data: mass_g by lake
t = 14.181, df = 330, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Island Lake and group NE 12 is not equal to 0
95 percent confidence interval:
2266.304 2996.360
sample estimates:
mean in group Island Lake mean in group NE 12
3165.0000 533.6677
[1] "Welch's t-test results for mass_g:"
Welch Two Sample t-test
data: mass_g by lake
t = 5.1368, df = 9.0578, p-value = 0.0006016
alternative hypothesis: true difference in means between group Island Lake and group NE 12 is not equal to 0
95 percent confidence interval:
1473.676 3788.989
sample estimates:
mean in group Island Lake mean in group NE 12
3165.0000 533.6677
<
[1] "Mann-Whitney U test results length:"
Wilcoxon rank sum test with continuity correction
data: length_mm by lake
W = 3226, p-value = 7.814e-08
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
262.0000 426.9999
sample estimates:
difference in location
357
A: n= 15, y= 8, s= 4 B : n= 15, y= 10, s= 5
Approach A vs. B
T-test df= 28 t= -3.53 p= 0.0014 M-W U (Wilcoxon’s) W= 41 p= 0.002
<
[1] "Standard t-test results for lenght_mm:"
Two Sample t-test
data: length_mm by lake
t = 8.616, df = 331, p-value = 2.888e-16
alternative hypothesis: true difference in means between group Island Lake and group NE 12 is not equal to 0
95 percent confidence interval:
270.1939 430.0761
sample estimates:
mean in group Island Lake mean in group NE 12
698.200 348.065
[1] "Mann-Whitney U test results length:"
Wilcoxon rank sum test with continuity correction
data: length_mm by lake
W = 3226, p-value = 7.814e-08
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
262.0000 426.9999
sample estimates:
difference in location
357
perm package<!
library(perm)
# Prepare data for permutation test
ne12_perm_data <- isl_ne12_df %>%
filter(lake == "NE 12") %>%
pull(length_mm)
# Randomly sample exactly 25 observations from NE 12 (set seed for reproducibility)
set.seed(123)
ne12_perm_data <- sample(ne12_perm_data, size = 25, replace = FALSE)
island_perm_data <- isl_ne12_df %>%
filter(lake == "Island Lake") %>%
pull(length_mm)
# Calculate the observed difference in means
observed_diff <- mean(ne12_perm_data, na.rm = TRUE) - mean(island_perm_data, na.rm = TRUE)
# Perform permutation test for difference in means using perm package
permTS(ne12_perm_data, island_perm_data,
alternative = "two.sided",
method = "exact.mc",
control = permControl(nmc = 10000))
Exact Permutation Test Estimated by Monte Carlo
data: GROUP 1 and GROUP 2
p-value = 2e-04
alternative hypothesis: true mean GROUP 1 - mean GROUP 2 is not equal to 0
sample estimates:
mean GROUP 1 - mean GROUP 2
-333.08
p-value estimated from 10000 Monte Carlo replications
99 percent confidence interval on p-value:
0.000000000 0.001059383